Search results for "Computer Science - Computation and Language"

showing 10 items of 31 documents

Pretraining and Fine-Tuning Strategies for Sentiment Analysis of Latvian Tweets

2020

In this paper, we present various pre-training strategies that aid in im-proving the accuracy of the sentiment classification task. We, at first, pre-trainlanguage representation models using these strategies and then fine-tune them onthe downstream task. Experimental results on a time-balanced tweet evaluation setshow the improvement over the previous technique. We achieve 76% accuracy forsentiment analysis on Latvian tweets, which is a substantial improvement over pre-vious work

Computer Science - Computation and Language

researchProduct

Distributed Real-Time Sentiment Analysis for Big Data Social Streams

2014

Big data trend has enforced the data-centric systems to have continuous fast data streams. In recent years, real-time analytics on stream data has formed into a new research field, which aims to answer queries about "what-is-happening-now" with a negligible delay. The real challenge with real-time stream data processing is that it is impossible to store instances of data, and therefore online analytical algorithms are utilized. To perform real-time analytics, pre-processing of data should be performed in a way that only a short summary of stream is stored in main memory. In addition, due to high speed of arrival, average processing time for each instance of data should be in such a way that…

Data streamFOS: Computer and information sciencesComputer Science - Computation and LanguageComputer sciencebusiness.industryData stream miningSentiment analysisBig dataMachine Learning (stat.ML)Databases (cs.DB)Data structurecomputer.software_genreField (computer science)Computer Science - Information RetrievalTree (data structure)Computer Science - DatabasesComputer Science - Distributed Parallel and Cluster ComputingAnalyticsStatistics - Machine LearningData miningDistributed Parallel and Cluster Computing (cs.DC)businesscomputerComputation and Language (cs.CL)Information Retrieval (cs.IR)

researchProduct

HUMAN: Hierarchical Universal Modular ANnotator

2020

A lot of real-world phenomena are complex and cannot be captured by single task annotations. This causes a need for subsequent annotations, with interdependent questions and answers describing the nature of the subject at hand. Even in the case a phenomenon is easily captured by a single task, the high specialisation of most annotation tools can result in having to switch to another tool if the task only slightly changes. We introduce HUMAN, a novel web-based annotation tool that addresses the above problems by a) covering a variety of annotation tasks on both textual and image data, and b) the usage of an internal deterministic state machine, allowing the researcher to chain different anno…

FOS: Computer and information sciences0303 health sciencesComputer Science - Computation and Languagebusiness.industryActive learning (machine learning)Computer science02 engineering and technology[INFO] Computer Science [cs]Modular designVariety (cybernetics)Task (project management)03 medical and health sciencesAnnotationHuman–computer interaction0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processing[INFO]Computer Science [cs]businessComputation and Language (cs.CL)030304 developmental biologyGraphical user interface

researchProduct

Designing the Business Conversation Corpus

2020

While the progress of machine translation of written text has come far in the past several years thanks to the increasing availability of parallel corpora and corpora-based training technologies, automatic translation of spoken text and dialogues remains challenging even for modern systems. In this paper, we aim to boost the machine translation quality of conversational texts by introducing a newly constructed Japanese-English business conversation parallel corpus. A detailed analysis of the corpus is provided along with challenging examples for automatic translation. We also experiment with adding the corpus in a machine translation training scenario and show how the resulting system benef…

FOS: Computer and information sciences050101 languages & linguisticsComputer Science - Computation and LanguageMachine translationComputer sciencebusiness.industrymedia_common.quotation_subject05 social sciencesAutomatic translation02 engineering and technologycomputer.software_genre0202 electrical engineering electronic engineering information engineeringComputingMethodologies_DOCUMENTANDTEXTPROCESSING020201 artificial intelligence & image processing0501 psychology and cognitive sciencesConversationQuality (business)Artificial intelligencebusinesscomputerComputation and Language (cs.CL)Natural language processingmedia_common

researchProduct

A Probabilistic Approach to Pronunciation by Analogy

2011

The relationship between written and spoken words is convoluted in languages with a deep orthography such as English and therefore it is difficult to devise explicit rules for generating the pronunciations for unseen words. Pronunciation by analogy (PbA) is a data-driven method of constructing pronunciations for novel words from concatenated segments of known words and their pronunciations. PbA performs relatively well with English and outperforms several other proposed methods. However, the best published word accuracy of 65.5% (for the 20,000 word NETtalk corpus) suggests there is much room for improvement in it. Previous PbA algorithms have used several different scoring strategies such …

FOS: Computer and information sciencesComputer Science - Computation and Language68T50Computation and Language (cs.CL)

researchProduct

How People Respond to the COVID-19 Pandemic on Twitter: A Comparative Analysis of Emotional Expressions from US and India

2023

The COVID-19 pandemic has claimed millions of lives worldwide and elicited heightened emotions. This study examines the expression of various emotions pertaining to COVID-19 in the United States and India as manifested in over 54 million tweets, covering the fifteen-month period from February 2020 through April 2021, a period which includes the beginnings of the huge and disastrous increase in COVID-19 cases that started to ravage India in March 2021. Employing pre-trained emotion analysis and topic modeling algorithms, four distinct types of emotions (fear, anger, happiness, and sadness) and their time- and location-associated variations were examined. Results revealed significant country …

FOS: Computer and information sciencesComputer Science - Computation and LanguageComputation and Language (cs.CL)

researchProduct

Differentiable Disentanglement Filter: an Application Agnostic Core Concept Discovery Probe

2019

It has long been speculated that deep neural networks function by discovering a hierarchical set of domain-specific core concepts or patterns, which are further combined to recognize even more elaborate concepts for the classification or other machine learning tasks. Meanwhile disentangling the actual core concepts engrained in the word embeddings (like word2vec or BERT) or deep convolutional image recognition neural networks (like PG-GAN) is difficult and some success there has been achieved only recently. In this paper we propose a novel neural network nonlinearity named Differentiable Disentanglement Filter (DDF) which can be transparently inserted into any existing neural network layer …

FOS: Computer and information sciencesComputer Science - Computation and LanguageComputation and Language (cs.CL)

researchProduct

ASR performance prediction on unseen broadcast programs using convolutional neural networks

2018

In this paper, we address a relatively new task: prediction of ASR performance on unseen broadcast programs. We first propose an heterogenous French corpus dedicated to this task. Two prediction approaches are compared: a state-of-the-art performance prediction based on regression (engineered features) and a new strategy based on convolutional neural networks (learnt features). We particularly focus on the combination of both textual (ASR transcription) and signal inputs. While the joint use of textual and signal features did not work for the regression baseline, the combination of inputs for CNNs leads to the best WER prediction performance. We also show that our CNN prediction remarkably …

FOS: Computer and information sciencesComputer Science - Computation and LanguageComputer scienceSpeech recognitionFeature extractionInformationSystems_INFORMATIONSTORAGEANDRETRIEVAL02 engineering and technology010501 environmental sciences01 natural sciencesConvolutional neural network[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]Task (project management)[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL]0202 electrical engineering electronic engineering information engineeringTask analysisPerformance prediction020201 artificial intelligence & image processingMel-frequency cepstrumTranscription (software)Hidden Markov modelComputation and Language (cs.CL)ComputingMilieux_MISCELLANEOUS0105 earth and related environmental sciences

researchProduct

Analyzing Learned Representations of a Deep ASR Performance Prediction Model

2018

This paper addresses a relatively new task: prediction of ASR performance on unseen broadcast programs. In a previous paper, we presented an ASR performance prediction system using CNNs that encode both text (ASR transcript) and speech, in order to predict word error rate. This work is dedicated to the analysis of speech signal embeddings and text embeddings learnt by the CNN while training our prediction model. We try to better understand which information is captured by the deep model and its relation with different conditioning factors. It is shown that hidden layers convey a clear signal about speech style, accent and broadcast type. We then try to leverage these 3 types of information …

FOS: Computer and information sciencesComputer Science - Computation and LanguageComputer scienceSpeech recognitionWord error rate02 engineering and technology010501 environmental sciences01 natural sciences[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL][INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL]0202 electrical engineering electronic engineering information engineeringPerformance predictionLeverage (statistics)020201 artificial intelligence & image processingComputation and Language (cs.CL)0105 earth and related environmental sciences

researchProduct

Multilingual Clustering of Streaming News

2018

Clustering news across languages enables efficient media monitoring by aggregating articles from multilingual sources into coherent stories. Doing so in an online setting allows scalable processing of massive news streams. To this end, we describe a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual story clusters. Unlike typical clustering approaches that consider a small and known number of labels, we tackle the problem of discovering an ever growing number of cluster labels in an online fashion, using real news datasets in multiple languages. Our method is simple to implement, computationally efficient and produces state-of-the-art …

FOS: Computer and information sciencesComputer Science - Computation and LanguageInformation retrievalComputer scienceInformationSystems_INFORMATIONSTORAGEANDRETRIEVAL02 engineering and technologyClusteringMedia MonitoringComputer Science - Information RetrievalComputingMethodologies_PATTERNRECOGNITIONMultilingual Methods0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingCluster analysisComputation and Language (cs.CL)Information Retrieval (cs.IR)

researchProduct